Examining the association between HIV prevalence and socioeconomic factors among young people in Zambia: Do neighbourhood contextual effects play a role?

Background The study examined the association between HIV infection and individual and neighbourhood-level socioeconomic factors in Zambia. Methods We used multilevel mixed effects logistic regression to examine the association of individual and neighbourhood level variables on HIV prevalence based on data from the 2013–14 and 2018 Zambia Demographic and Health Surveys, population-based cross-sectional surveys. The analysis was restricted to young people (15–24 years) with HIV serostatus results (n = 11,751 and n = 10,154). HIV serostatus was the outcome variable and socioeconomic status was measured by wealth, education and employment. Results Overall, at individual level, education was associated with reduced odds of HIV infection among young women and men. Conversely, relative wealth was generally associated with increased odds of infection for both young women and men. Young, employed men were at reduced odds of HIV infection than the unemployed. Living in neighbourhoods with higher average level of education was associated with higher odds of HIV infection. In 2013–14, 13% and 11% of the variation in HIV infection among young men and women was attributed to neighbourhoods, while 20% and 11% variation was attributed to neighbourhoods in 2018. Inclusion of individual and neighbourhood variables in the full regression model accounted for 65.7% and 59.5% of explained variance in 2013–14 and 64.6% and 44.3% in 2018, for women and men, respectively. This reduced unexplained variance by an average of 56% in 2013–14 and 29% in 2018. Conclusion We found that HIV infection among young people in Zambia is more strongly associated with individual-level socioeconomic factors compared to neighbourhood factors. Individual-level education remains an important socioeconomic factor associated with reduced odds of HIV infection. This suggests that the HIV response in Zambia should still focus on individual level prevention strategies.


Introduction
other studies conducted in Zimbabwe, South Africa and other sub-Saharan African countries [8][9][10]. In a 2013 study, Magadi's analysis of data from 20 SSA countries conducted between 2003 to 2008, revealed that urban poor in SSA had significantly higher odds of HIV infection than their non-poor counterparts [8]. Lopman et al., in a 1998Lopman et al., in a to 2003 population-based open cohort study in Eastern Zimbabwe, also found that HIV incidence was lower in the higher wealth tercile [9]. Likewise, a 2003 to 2005 longitudinal population-based study in rural South Africa by Bärnighausen et al. revealed that individuals living in households with middle-level relative wealth were at the highest risk of HIV acquisition [10]. The mixed findings may be attributed to the phase of the epidemic and to contextual and neighbourhood-level factors. According to Gillespie et al., accounting for a number of underlying contextual factors such as urban/rural residence, community wealth and mobility, reduced the positive association between wealth status and HIV based on findings of a review of studies conducted between 2004 and 2007 in sub-Saharan African countries with high HIV prevalence [11].
The association between employment and HIV prevalence also shows inconsistent patterns. Msisha et al., investigating socioeconomic status and HIV seroprevalence in Tanzania based on a study conducted in 2003-4, found that employment status was positively associated with high HIV prevalence. Men and women employed in professional (i.e., non-agricultural or manualrelated) occupations were at a higher risk of infection compared to those in the agricultural sector. However, for men the association changed after adjusting for other covariates, with unemployed men having a three-fold higher risk of being infected compared to those in the agricultural sector [12]. Conversely, the association remained the same for females after adjusting for other covariates, with professional women associated with high HIV prevalence. Bunyasi and Coetzee found no significant association between employment and HIV prevalence in their study based on the 2007 and 2008 Prevention of Mother to Child Transmission of HIV, Effectiveness in Africa and Research Linkages to HIV Care survey (PEARL study) in South Africa [13].
Education in general appears to be protective with regard to HIV risk [10,[14][15][16]. Barnighausen et al. in a 2003 to 2005 study found that educational attainment significantly reduced the hazard of becoming infected in a rural community in South Africa [10]. Investigating change over time in HIV prevalence by educational attainment in selected urban and rural communities in Zambia, Michelo et al. found that there was a shift towards reduced risk of HIV infection in subgroups with higher education from 1995 and 2003 [15]. The dominating patterns in many SSA countries is that the association has transitioned from a higher risk of HIV infection among the educated, in the early phase of the epidemic, to a lower risk of infection among the more highly educated as the epidemic matures. According to Hargreaves et al., studies conducted from 1996 onwards were more likely to find a lower risk of HIV infection among the most educated [16].
Most studies on factors associated with HIV infection have primarily focused on individual-level characteristics, but neighbourhood-level factors can also be independently associated with HIV infection [11]. A systematic review by Peterson et al., evaluating the influence of contextual factors on HIV/AIDS, sexually transmitted infections, and risky sexual behaviour in sub-Saharan Africa, based on studies conducted from 1995 to 2017, showed contextual factors to be significantly associated with HIV infection [17]. Three of the included studies found associations between neighbourhood-level poverty and likelihood of HIV positivity, of these, one found that women living in communities with lower socioeconomic status (SES) had higher risk of HIV infection. In contrast, some studies found lower likelihood of HIV infection in poorer communities [17]. Two studies conducted among young women in selected communities in Zambia between 1997-1998 and 2003 found that neighbourhood-level factors were as strongly associated with HIV infection as individual factors [14,18]. A similar analysis of both individual and neighbourhood level socioeconomic factors in relation to HIV risk has not yet been done at national level in Zambia. Our study extends previous research by using nationally representative data from the 2013-14 and 2018 Zambia Demographic and Health Survey (ZDHSs) and applying multilevel analysis to investigate the association between HIV infection and individual and neighbourhood-level wealth, employment, and education among young women and men two decades after the peak of the epidemic.

Data source and study population
We analysed data from the 2013-14 and 2018 Zambia Demographic and Health Survey (ZDHS). Population-based cross-sectional surveys with a two-stage stratified cluster sample design. The eligible study population included women aged 15-49 years and men aged 15-59 years who were either permanent residents of the sampled households or visitors present in the households on the night before the survey [19,20]. In, 2013-14, a total of 16,411 women and 14,773 men were successfully interviewed, yielding response rates of 96.2% and 91.1%, respectively. Similarly in 2018, 13,683 women and 12,132 men were successfully interviewed, yielding response rates of 96.4% and 91.6%, respectively. The principal reason for nonresponse among eligible adults was failure to find individuals at home despite repeated visits, followed by refusal to be interviewed. In 2013-14, of the respondents who completed the structured interview, 29,007 respondents aged 15-59 years were tested for HIV, corresponding to a response rate of 87.2%. Young people aged 15-24 years accounted for 39.9% of the people tested or 11,571 in total. Further details about participation in the survey and HIV testing response rates can be found in the 2013-14 ZDHS report [19]. In 2018, 91% of respondents tested for HIV (24,702), with young people accounting for 41.1% of the people tested or a total of 10,154. Further details about participation in the survey and HIV testing response rates can be found in the 2018 ZDHS report [20].
The 2013-14 ZDHS HIV testing protocol allowed for linking of the HIV results to the sociodemographic data. The HIV Laboratory testing algorithm used for the DHS surveys followed the 2005 UNAIDS/WHO guidelines for HIV testing in population based surveys. Eligible women and men who consented to HIV testing were asked to voluntarily provide about five drops of blood from a finger prick for anonymous testing. Each Dried Blood Spots (DBS) sample was given a bar code label, and a duplicate label was attached to the individual's questionnaire, while the third bar code was attached to the blood sample transmittal form to track the blood sample from the field to the central laboratory [19]. This testing algorithm included Vironostika HIV antigen/antibody combination assay (Biomerieux) as the first assay and Enzygnost HIV Integral II assay (Dade Behring) as the second confirmatory testing assay. Both tests are Fourth-generation enzyme-linked immunoassay assays (ELISAs) [21]. Western Blot was used as a third confirmatory test. Further details about the testing methodology can be found in the 2013-14 ZDHS report and in an article we published, which is based on the same data [19,22].
For respondents who wanted to know their HIV status, home-based counselling and testing were offered in parallel following the national HIV testing algorithm to ascertain HIV infection status. Respondents were tested for HIV using Determine™ HIV ½ (Alere Healthcare) and Uni-Gold™ (Trinity Biotechnology) rapid diagnostic tests, which were run concurrently.
In 2015 UNAIDS/WHO guidelines for HIV testing in population based surveys were revised taking into account concerns raised that enzyme-immunoassay (EIA) testing strategies may result in overestimation of HIV prevalence. The new guidelines recommend the use of a more specific confirmatory assay to assess all enzyme immunoassay-reactive specimens [23]. The 2013/14 ZDHS blood specimens were retested in 2017 following these new guidelines to assess the extent of overestimation and to obtain a confirmed HIV prevalence estimate.
Confirmed refers to an estimate verified with a more specific assay. DBS blood specimens from respondents classified as HIV positive according to the original survey laboratory testing algorithm and who had one or two negative home based Rapid Diagnostic Tests (RDT) conducted in the field, or no RDT results at all, were tested using Inno-Lia HIV I/II assay. The results of the Inno-Lia assay were taken as the final confirmatory result. Blood specimens from laboratory positive respondents with a positive RDT result were considered confirmed positive with no Inno-Lia testing [24].
The 2018 ZDHS HIV testing protocol also allowed for linking of the HIV results to the socio-demographic data. The testing algorithm for this survey included two Enzyme immunoassays (EIAs) namely, Bioelisa HIV-1+2 Ag/Ab assay (Biokit, Spain) as the first assay and Genscreen ULTRA Ag/Ab (Bio-rad, France) as the second confirmatory testing assay. The Geenius HIV 1/2 supplementary assay was used as a confirmatory test of all double EIA positive samples. Further details about the testing methodology can be found in the 2018 ZDHS report [20].

Variable definition
The selection of variables to include in the analysis of this article was based on the proximate determinants framework for factors affecting the risk of transmission of HIV developed by Boerma and Weir [25]. According to this conceptual framework underlying variables, such as socioeconomic status influence proximate determinants, which in turn have an effect on biological mechanisms to influence health outcomes (i.e. HIV infection). The proximate determinants include among others concurrent sexual partnerships, number of sexual partners and condom use. The conceptual framework helps to understand the causal pathways from distal socioeconomic factors to HIV infection The focus was on underlying determinants, i.e., the socioeconomic context of the neighbourhood (neighbourhood educational attainment, wealth, and employment status) and individual demographic and socioeconomic factors. The operational definitions are as follows.
Outcome variable. The dependent variable for this study was HIV status, which is defined as serostatus determined by testing blood samples collected from each consenting individual (0 indicating "HIV negative" and 1 "HIV positive"). Independent variables. Individual level. The main variables of interest were educational attainment, wealth, and employment status. The wealth index score from the ZDHS was used, and separate wealth tertiles for urban and rural populations were created to reduce residence bias. The wealth score was calculated from the first component of principal component analysis (PCA) of household assets, housing characteristics, and access-to-amenities data (e.g., roof and floor material, electricity, water supply, possession of goods such as a bicycle and television, and so forth). The methodology used to calculate the wealth index is based on the Filmer and Pritchett approach and a detailed explanation is available in the Measure DHS report [26]. Educational attainment was defined as the reported number of years spent in school, and this was used as a continuous variable in the regression model. Employment status was measured by asking respondents if they had been working in the 12 months preceding the survey and this was then categorised as unemployed and employed. Occupation was defined as the main type of work done in the 12 months preceding the survey, and this was classified using the ILO International Standard Classification of Occupation Codes (ISCO). However, this variable was not included in the regression model, as it partly represents the same information as employment status. In addition, we included sex, age, marital status and residence in the regression models to adjust for potential confounding.
Neighbourhood level. In this study, enumeration areas were used as a proxy for neighbourhoods. An enumeration area is the smallest geographic area used in census of populations and other types of surveys, with an average size of 130 households or 600 people. People residing in these areas usually have similar characteristics. Variables describing the characteristics of the neighbourhoods were derived by aggregating individual responses within each enumeration area (cluster) for all respondents and then categorising the means or proportions into three levels (low, medium, and high). Neighbourhood educational attainment was derived by calculating the mean number of years of school attained by the individuals who were interviewed in each enumeration area. Neighbourhood employment was the proportion of individuals interviewed in each enumeration area who were categorised as employed in the last 12 months. Neighbourhood relative wealth was derived by calculating the mean of the wealth scores of all respondents in the enumeration area.

Analysis
We restricted the analyses to young people (15-24 years). We conducted the analyses of both sexes and then stratified by women and men. Analyses were carried out in two phases. In the first phase we explored bivariate associations between HIV serostatus and individual demographic and socioeconomic factors, and neighbourhood-level socioeconomic factors. In the second phase, we used multilevel mixed-effects logistic regression analyses on HIV status with associated factors, which we conducted using Stata 15. Multilevel mixed effects logistic regression takes into account both individual and neighbourhood-level variability. The bivariate and multivariate analyses also controlled for the potential confounder age, which was adjusted for as a linear effect. Collinearity was assessed using the variance inflation factor, and after concluding that there was limited collinearity, all the demographic and socioeconomic variables explored in the bivariate analysis were included in the multivariate analysis. We tested five separate models to examine the association between HIV prevalence and socioeconomic factors. Model 1 was the null or random intercept-only model, which did not include any socioeconomic or demographic factors. Subsequent models gradually included socioeconomic and demographic variables, starting with individual-level variables only, then neighbourhood-level variables only, and finally included both individual and neighbourhood-level variables (corresponding to model 2, model 3 and model 4, respectively). The final model included only significant variables at both individual and neighbourhood level. The likelihood-ratio test was used for assessing the goodness-of-fit of the models and to determine whether adding independent variables to the intercept-only model significantly improved the fit.
Multilevel statistics estimated were, explained variance, the interclass correlation (ICC)), and log likelihood tests. To estimate explained variance (R 2 ), we use the approach explained in Hox, J (2010), and proposed by Snijders and Bosker (1999) for multilevel mixed effects logistic regression models, which does not rely on the likelihood. The estimated variance is decomposed into the lowest-level residual variance (σ 2 R ), which is fixed to π 2 /3 = 3.29 in logistic models, the second-level variance (τ 2 0 ) and the variance of the linear predictor from the fixed part of the model (σ 2 F ) [27]. The explained variance is estimated using the formula: Where: σ 2 F is the explained part of the total variance τ 2 0 is the unexplained variance at the neighbourhood level σ 2 R is the unexplained variance at the individual level (assumed to be π 2 /3 = 3.29). The ICC or rho indicates the proportion of the variance explained by the grouping structure in the population, which in our case are the neighbourhoods [27]. This is same as the unexplained neighbourhood-level variance as a proportion of total variance. We tested for interactions in the final model, for individual and neighbourhood variables, but none were found to be statistically significant.

Ethics
Ethical approval for the 2013-14 ZDHS was obtained from the Tropical Disease and Research Centre (TDRC) ethical committee, the institutional board of ICF International, and the Centers for Disease Control and Prevention (CDC) Atlanta research ethics review board. Ethical approval for the 2018 ZDHS was obtained from the Tropical Disease and Research Centre (TDRC) ethical committee and the institutional board of ICF International. In both the 2013-14 and 2018 surveys, participation in the surveys was obtained by soliciting verbal informed and voluntary consent. Participants were informed that the survey's HIV testing results were anonymous. Home-based counselling and testing were offered in parallel to participants following the national HIV testing algorithm to ascertain HIV infection status for respondents who consented and wanted to know their HIV status. Concurrent HIV testing with Deter-mine™ HIV ½ (Alere Healthcare) and Uni-Gold™ (Trinity Biotechnology) was the home-based testing procedure, and nurses and lay counsellors provided pre-and post-test counselling. If either of the rapid tests was HIV reactive (positive), the respondent was referred to the nearest health facility for further assessment, treatment, and care.

Demographic characteristics of the study participants
In total, 11,571 young people aged 15-24 years were interviewed and consented to HIV testing in 2013-14, while 10,154 were interviewed and tested in 2018. More than fifty percent were living in rural areas, during the two study years. There were more young women than men in the study population (52% and 53%, compared to 48% and 47% in both 2013-14 and 2018) (see S1 & S2 Tables).

HIV prevalence and associated factors
An estimated 4.5% of the young people were HIV positive in 2013-14 and this declined to 3.8% in 2018. The prevalence was notably higher among young people residing in urban areas than in rural areas (6.6% and 5.3% compared with 2.4% and 2.6%, respectively). Age was associated with higher odds of being infected. Young people aged 20-24 were more likely to be infected than those in the younger age group of 15-19 years. Formerly married young people were at increased odds of being infected (aOR 3.25, 95% CI 2.26-4.67 and aOR 4.03,95% CI 2.66-6.10) compared to the never-married (formerly married include those who were separated, divorced and widowed). Conversely, married women had lower odds of being infected (aOR 0.80 95% CI 0. 63-1.03) and (aOR 0.82 95% CI 0.61-1.11) ( Table 2).
HIV prevalence and age-adjusted odds ratios varied by the selected socioeconomic factors. Education was associated with reduced odds of infection, particularly for young men in both 2013-14 and 2018. Among both young men and women, those with the highest wealth status had higher odds of being infected (aOR 2.33, 95% CI 1.36-3.99 for men, and aOR 2.70, 95% CI 2.00-3.65 for women in 2013-14) and (aOR 2.51, 95% CI 1.29-4.88 for men, and aOR 1.79, 95% CI 1.25-2.54 for women in 2018) than those in the poorest tertile (Table 2). Employment was associated with lower odds of being infected (aOR 0.67, 95% CI 0.55-0.80) and (aOR 0.75, 95% CI 0.60-0.95) and compared to those unemployed ( Table 1). The agricultural occupation was associated with lower odds of HIV infection among young men and women in both years  , respectively), compared to those who were unemployed. However, professional occupation was associated with higher odds of being infected among both young men and women. Living in neighbourhoods with a high proportion of employed residents was associated with lower odds of HIV infection among young women and men. Conversely, living in a neighbourhood with high wealth and education status was associated with higher odds of HIV infection ( Table 2).
The intercept-only model showed that the overall HIV prevalence varied across neighbourhoods with a neighbourhood difference of 14% in 2013-14 (results not shown) (see S3 Table). In Model 2, the addition of individual-level variables to the intercept-only model explained more of the variance than the addition of neighbourhood variables (it explained a variance of 62% and a neighbourhood difference of 7% versus 9% and 8%, respectively). The inclusion of both individual and neighbourhood variables in Model 4 explained 62.4% of the variance in HIV prevalence, and the likelihood-ratio test showed that the fit of the model significantly improved compared to the intercept model. The following variables were associated with higher odds of being HIV positive in Model 4: age, urban residence, formerly married, medium wealth and residing in neighbourhoods with high educational attainment and medium-level employment. Educational attainment at the individual level and employment were significantly associated with lower odds of being infected (results not shown) (see S3 Table).
For 2018, the intercept-only model showed 13% variation in HIV prevalence across neighbourhoods (results not shown) (see S3 Table). In Model 2, the addition of individual-level variables to the intercept-only model explained 58% of the variance and a neighbourhood difference of 11%. The inclusion of both individual and neighbourhood variables in Model 4

PLOS ONE
explained 59% of the variance in HIV prevalence, and the likelihood-ratio test showed that the fit of the model significantly improved compared to the intercept model. The following variables were associated with higher odds of being HIV positive in Model 4: age, urban residence, married or formerly married, and residing in neighbourhoods with medium-level wealth. Employment was significantly associated with lower odds of being infected (results not shown) (see S3 Table). Among young males, the intercept-only model shows that HIV prevalence varied by 13% across neighbourhoods in 2013-14. The inclusion of both individual and neighbourhood variables explained 59.5% of the variance in HIV prevalence, with a neighbourhood difference of 2%. The likelihood-ratio test showed that the inclusion of both individual and neighbourhood variables to the intercept model improved the fit of the model. Factors associated with higher HIV infection were age, urban residence and high neighbourhood education and wealth. Individual educational attainment and employment were associated with reduced odds of HIV infection ( Table 3).
The intercept-only model in 2018 shows that HIV prevalence varied by 20% across neighbourhoods. The inclusion of both individual and neighbourhood variables explained 44.3% of the variance in HIV prevalence, with a neighbourhood difference of 17%. The likelihood-ratio test showed that the inclusion of both individual and neighbourhood variables to the intercept model improved the fit of the model. Factors associated with higher HIV infection were age and being formerly married. Individual educational attainment was associated with reduced odds of HIV infection (Table 3).
Model 1 in 2013-14 showed that HIV prevalence among females varied across neighbourhoods, with a statistically significant neighbourhood difference of 11%. The addition of both individual and neighbourhood variables in Model 4 explained 65.7% of the variance, with a neighbourhood difference of 4%. The likelihood-ratio test showed that the inclusion of both individual and neighbourhood variables to the intercept model improved the fit of the model. Residing in urban areas, older age, being formerly married, wealth and residing in neighbourhoods with medium-level employment were associated with higher odds of being HIV positive. Educational attainment at the individual level was associated with lower odds of being infected. High neighbourhood education was associated with increased odds of HIV infection (Table 4).
For 2018 the intercept-only model showed a higher variation in HIV prevalence across neighbourhoods of 14%. The addition of both individual and neighbourhood variables in Model 4 explained 64.6% of the variance, with a neighbourhood difference of 11%. The likelihood-ratio test showed that the inclusion of both individual and neighbourhood variables to the intercept model improved the fit of the model. Older age, being formerly married and residing in neighbourhoods with medium-level wealth were associated with higher odds of being HIV positive among females (Table 4).

Discussion
Our analysis indicates that both individual and neighbourhood socioeconomic factors influence the likelihood of young people being infected with HIV in Zambia. At the individual level, older age and urban residence were found to be associated with increased odds of infection for both young men and women. Formerly married women were also at increased odds of HIV infection in both 2013-14 and 2018, while formerly married young men were associated with increased HIV risk in 2018. Education was associated with a reduced risk of HIV infection for both sexes. Young employed men were also at reduced risk of infection. In the full multivariate model only neighbourhood education was associated with higher odds of HIV  Education was found to reduce the risk of infection in young people. An additional year of educational attainment reduced the odds of being infected by 11% and 10% for young men and 4% for young women in 2013-14 and 2018, respectively. However, the strength of association reduced in 2018 among young women. As mentioned earlier, some studies have documented similar findings [10,15,16]. A systematic review assessing the association between education and HIV by Hargreaves et al. found that studies in sub-Saharan Africa done from 1996 onwards showed a lower risk of infection among the most educated [16]. We also found that in the multivariate analysis, after adjusting for neighbourhood factors, the protective effect of individual education was maintained. It may reflect that young people with high education may feel more empowered to make decisions that can protect them from HIV infection, contrary to the less-educated young people. Educated persons also tend to be early adopters of new practices; they internalize relevant information and translate this knowledge into behavioural change [28].
It has been postulated that the spread of education changes the community environment and promotes socially acceptable behaviour, i.e., delayed sexual debut, limiting sexual partners, etc. [28]. Therefore, people living in neighbourhoods with high educational attainment could be expected to be more likely to adopt health-promoting behaviours. Thus, it is logical to expect that neighbourhood-level educational attainment might also be protective. To the contrary, our study found that neighbourhood-level educational attainment was associated with increased risk of HIV infection. These findings contradict the findings from the earlier studies done in selected areas of Zambia, which found that neighbourhood education was associated with a reduced risk of HIV infection. It could be that the two studies were not representative of the whole population or that the contextual factors affecting risk are not static and may change in their effect as the HIV epidemic evolves. Living in a neighbourhood with medium-level employment and wealth was found to be a predictor of higher odds of HIV infection for young women in 2013 and 2018, respectively. Neighbourhoods with a medium proportion of persons employed may be more heterogeneous in terms of the socioeconomic status of the inhabitants and have more disposable resources available than neighbourhoods with higher levels of unemployment, thereby negatively affecting young peoples' sexual behaviour. We postulate that those with more resources in communities where others are unemployed may be more likely to engage in particular risky lifestyles, such as higher rates of partner change and multiple sexual partners, because of greater autonomy and spatial mobility. The statistical evidence of a positive association between neighbourhood employment and high HIV prevalence is surprising and needs further investigation. Our findings indicate that the associations with socioeconomic status may change continuously over time, and that there is a need for more nuanced terms to describe the different stages of the HIV epidemic. Further, the findings may also suggest that the association between neighbourhood-level socio-economic factors and HIV prevalence is complex. The correlation between neighbourhood wealth and education suggests that the two SEP indicators capture partially similar underlying contextual aspects of the neighbourhood.
Our study has some limitations. As in any cross-sectional study, the exposures and outcome were measured simultaneously in this given population. This limits our ability to draw causal inferences about the associations found. However, since the study was limited to young people, HIV infection was likely to be recent. The assumption is that they have recently become sexually active and are not likely to be greatly affected by mortality and treatment, and only a small proportion would have been infected by their mother. According to the Zambia populationbased HIV impact assessment (ZAMPHIA) survey, about 1% of children (0-14 years) were HIV positive, most likely infected by their mothers [29]. Some important factors may have biased the associations with neighbourhood characteristics in our study, such as using enumeration areas as a proxy of neighbourhoods. These clusters are not necessarily representative of naturally occurring neighbourhoods where individuals reside or residential areas. Studies have, however, found them to be small enough to be a useful proxy [30]. A high proportion of the total variance was not explained by the neighbourhood variables used. This could be partly because there was no information on more specific community-level characteristics that have been associated with HIV prevalence in other studies, such as the availability of markets, health facilities, bars, and proximity to trading areas and major roads. Mobility, which has been linked to the risk of HIV infection in numerous studies, was also not measured in our study, and thus we do not know how long respondents had lived in the neighbourhood where they were interviewed. Another limitation is that the only neighbourhood factors that are measured are aggregate individual responses within each enumeration area which is the proxy neighbourhood.

Conclusion
This study using nationally representative data indicates that individual socioeconomic factors are more strongly associated with HIV infection than neigbourhood factors among young people in Zambia. Individual-level education remains an important socioeconomic factor associated with reduced odds of HIV infection. This suggests that the HIV response in Zambia should still focus on individual level prevention strategies. Further, contrasting associations between HIV Infection risk and neighbourhood-level education and individual-level education suggests a complex relationship between the two levels that requires further investigation.
Supporting information S1